Zipf's Law and the Frequency of Characters or Words of Oracles

نویسنده

  • Xiuli Wang
چکیده

The article discusses the frequency of characters of Oracle,concluding that the frequency and the rank of a word or character is fit to Zipf-Mandelboit Law or Zipf’s law with three parameters,and figuring out the parameters based on the frequency,and pointing out that what some researchers of Oracle call the assembling on the two ends is just a description by their impression about the Oracle data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extension of Zipf's Law to Word and Character N-grams for English and Chinese

It is shown that for a large corpus, Zipf 's law for both words in English and characters in Chinese does not hold for all ranks. The frequency falls below the frequency predicted by Zipf's law for English words for rank greater than about 5,000 and for Chinese characters for rank greater than about 1,000. However, when single words or characters are combined together with n-gram words or chara...

متن کامل

Rank-frequency relation for Chinese characters

We show that the Zipf's law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf's law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Z...

متن کامل

Zipf's Law and Statistical Data on Modern Tibetan

In this paper, a large scale modern Tibetan text corpus is built, which includes about 190 thousands documents, 67.21 million words, 93.66 million syllables in total. Based on the corpus, statistics are made in several language units in different granularities. Statistical data show that : a syllable has 3.26 letters or 2.20 super characters in average, while a sentence has 75.40 letters or 63....

متن کامل

Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, a...

متن کامل

Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings

The word-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the a priori values of the total number of words in the text (M), the number of distinct words (N) and the number of repetitions of the most common word (k(max)). It is here shown tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1412.2821  شماره 

صفحات  -

تاریخ انتشار 2014